Big Data: Algorithms, Analytics, and Applications by Li Kuan-Ching Jiang Hai Yang Laurence T. Cuzzocrea Alfredo

Big Data: Algorithms, Analytics, and Applications by Li Kuan-Ching Jiang Hai Yang Laurence T. Cuzzocrea Alfredo

Author:Li, Kuan-Ching,Jiang, Hai,Yang, Laurence T.,Cuzzocrea, Alfredo [Li, Kuan-Ching,Jiang, Hai,Yang, Laurence T.,Cuzzocrea, Alfredo]
Language: eng
Format: epub, pdf
Published: 2015-01-16T12:32:13+00:00


k

m

= ln2

≈ 0 6

.

.

n

n

If each hash function is perfectly independent of all others, then the probability of a bit

remaining 0 after n elements is

kn

− kn

p 1 1

e m

=

.

m

The FP—an important performance metric of a Bloom filter—is then

− kn k

p

k

(1

) (

1

m

1

)

FP =

− p ≈ − e

,

k

2

for the optimal k. Note with that increasing k, the probability of an FP actually is supposed to decrease, which is an unintuitive outcome because one would expect the filter to get

filled up with keys earlier.

Streaming Algorithms for Big Data Processing on Multicore Architecture ◾ 227

Let us analyze k. For the majority of cases, m ≪ n, which means that the optimal number of hash functions is 1. Two functions are feasible only with m > 2.5 n. In most realistic cases, this is almost never so, because n is normally huge, while m is something practical like 24 or 32 (bits).

12.5.2 Unconventional Bloom Filter Designs for Data Streams

Based on Section 12.5.1, the obvious problem in Bloom filters is how to improve their flex-

ibility. As a side note, such Bloom filters are normally referred to as dynamic.

Figure 12.3 shows the generic model, which applies to most of the proposals of dynamic

Bloom filters. The simple idea is to replace a simple bit string with a richer data structure

(the change in the Bloom filter in the figure). Each bit in the filter now simply is a pointer

to a structure that supports dynamic operations.

The other change that ensues is that the OR operation is no longer applicable. Instead, a

nontrivial manipulation has to be performed on each bit of the value that was supposed to be

OR-ed in the traditional design. Natural y, this incurs a considerable overhead on performance.

The following classes of dynamic Bloom filters are found in the literature.

• Stop additions filter. This filter will stop accepting new keys beyond a given point.

Obviously, this is done in order to keep the FP beyond a given target value.

• Deletion filter. This filter is tricky to build, but if accomplished, it can revert to a given previous state by forgetting the change introduced by a given key.

• Counting filter. This filter can count both individual bits of potential occurrences and

entire values—combinations of bits. This particular class of filters obviously can find

practical applications in data streaming. In fact, the existing example of the d-left

hashing method in Reference 32 uses a kind of counting Bloom filter [33]. Another

example can be found in Reference 34, where it is used roughly for the same purpose.

Each

Hash

Bloom

in turn

keys

filter

1

1

* State

1

1

A nontrivial

0

0

manipulation

* …

0

0

1

1

*

1

1

It

em

Multiple

0

0

*

hash



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.